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Abstract 



This paper presents a sharp approximation of the density of long runs of a random walk conditioned on 

its end value or by an average of a functions of its summands as their number tends to infinity. In the large 

deviation range of the conditioning event it extends the Gibbs conditional principle in the sense that it provides a 

• description of the distribution of the random walk on long subsequences. Approximation of the density of the runs 

^ ■ is also obtained when the conditioning event states that the end value of the random walk belongs to a thin or a 

thick set with non void interior. The approximations hold either in probability under the conditional distribution 

fT^ ■ of the random walk, or in total variation norm between measures. Application of the approximation scheme to 

the evaluation of rare event probabilities through Importance Sampling is provided. When the conditioning event 

is in the zone of the central limit theorem it provides a tool for statistical inference in the sense that it produces 

an effective way to implement the Rao-Blackwell theorem for the improvement of estimators; it also leads to 

Qh I conditional inference procedures in models with nuisance parameters. An algorithm for the simulation of such 

. long runs is presented, together with an algorithm determining the maximal length for which the approximation 

■ is valid up to a prescribed accuracy. 

■ 

1 Context and scope 

This paper explores the asymptotic distribution of a random walk conditioned on its final value as the number of 
summands increases. Denote X" := (Xi, ..,X„) a set of n independent copies of a real random variable X with 
CO ■ density px on M and Si^„ :— Xi + ...+X„. We consider approximations of the density of the vector 'X.i= (Xi, .., Xj.) 
' on R*^ when Si_„ = nan and a„ is a convergent sequence. The integer valued sequence fc := fc„ is such that 

Csi ■ < lim sup k/n < 1 (Kl) 

o: 

" together with 

lim n~ k — oo. (K2) 



Therefore we may consider the asymptotic behavior of the density of the trajectory of the random walk on long 
; ^ ' runs. For sake of applications we also address the case when Si.„ is substituted by Ui_„ := u (Xi) + ... + u (Xi) 
, for some real valued measurable function m, and when the conditioning event writes (Ui.„ ~ u\^n) where ui^n/n 
converges as n tends to infinity. A complementary result provides an estimation for the case when the conditioning 
event is a large set in the large deviation range, (Ui.„ S nA) where A is a Borel set with non void interior with 
EuH. <essinf A; two cases are considered, according to the local dimension of A at its essential infimum point 
essinf A. 

The interest in this question stems from various sources. When k is fixed (typically k = 1) this is a version 
of the Gibbs Conditional Principle which has been studied extensively for fixed a„ ^ EIL, therefore under a 
large deviation condition. Diaconis and Freedman [12] have considered this issue also in the case k/n — > 9 for 
< < 1, in connection with dc Finetti's Theorem for exchangeable finite sequences. Their interest was related to 
the approximation of the density of Xj;' by the product density of the summands X^'s, therefore on the permanence 
of the independence of the Xj's under conditioning. Their result is in the spirit of van Camperhout and Cover [21] 
and to be paralleled with Csiszar's [8] asymptotic conditional independence result, when the conditioning event is 
(Si^n > na„) with a„ fixed and larger than i?X. In the same vein and under the same large deviation condition 
Dembo and Zcitouni [9] considered similar problems. This question is also of importance in Statistical Physics. 
Numerous papers pertaining to structural properties of polymers deal with this issue, and we refer to den Hollander 
and Weiss [10] and [11] for a description of those problems and related results. In the moderate deviation case 
Ermakov [14] also considered a similar problem when k ^ 1. 
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Approximation of conditional densities is the basic ingredient for the numerical estimation of integrals through 
improved Monte Carlo techniques. Rare event probabilities may be evaluated through Importance sampling tech- 
niques; efhcient sampling schemes consist in the simulation of random variables under a proxy of a conditional 
density, often pertaining to conditioning events of the form (Ui^n > nan); optimizing these schemes has been a 
motivation for this work. 

In parametric statistical inference conditioning on the observed value of a statistics leads to a reduction of 
the mean square error of some estimate of the parameter; the celebrated Rao-Blackwell and Lehmann-Scheffe 
Theorems can be implemented when a simulation technique produces samples according to the distribution of the 
data conditioned on the value of some observed statistics. In these applications the conditioning event is local and 
when the statistics is of the form Uj „ then the observed value Ui.„ satisfies lim„^oo ui.„/n ~ Eu (X) . Such is the 
case in exponential families when Ui,„ is a sufficient statistics for the parameter. Other fields of applications pertain 
to parametric estimation where conditioning by the observed value of a sufficient statistics for a nuisance parameter 
produces optimal inference through maximum likelihood in the conditioned model; in general this conditional 
density is unknown; the approximation produced in this paper provides a tool for the solution of these problems. 

Both for Importance Sampling and for the improvement of estimators, the approximation of the conditional 
density of Xj^ on long runs should be of a very special form: it has to be a density on K*^, easy to simulate, 
and the approximation should be sharp. For these applications the relative error of the approximation should be 
small on the simulated paths only. Also for inference through maximum likelihood under nuisance parameter the 
approximation has to be accurate on the sample itself and not on the entire space. 

Our first set of results provides a very sharp approximation scheme; numerical evidence on exponential runs with 
length n ~ 1000 provide a relative error of the approximation of order less than 100% for the density of the first 800 
terms when evaluated on the sample paths themselves, thus on the significant part of the support of the conditional 
density; this very sharp approximation rate is surprising in such a large dimensional space, and it illustrates the 
fact that the conditioned measure occupies a very small part of the entire space. Therefore the approximation of 
the density of Xj^ is not performed on the sequence of entire spaces M.^ but merely on a sequence of subsets of R*^ 
which bear the trajectories of the conditioned random walk with probability going to 1 as n tends to infinity; the 
approximation is performed on typical paths. 

The extension of our results from typical paths to the whole space M'^' holds: convergence of the relative error on 
large sets imply that the total variation distance between the conditioned measure and its approximation goes to 
on the entire space. So our results provide an extension of Diaconis and Freedman [12] and Dembo and Zcitouni 
[9] who considered the case when k is of small order with respect to n; the conditions which are assumed in the 
present paper are weaker than those assumed in the just cited works; however, in contrast with their results, we do 
not provide explicit rates for the convergence to of the total variation distance on R*^. 

It would have been of interest to consider sharper convergence criteria than the total variation distance; the 
X^-distance, which is the mean square relative error, cannot be bounded through our approach on the entire space 
R*^, since it is only handled on large sets of trajectories (whose probability goes to 1 an n increases); this is not 
sufficient to bound its expected value under the conditional sampling. 

This paper is organized as follows. Section 2 presents the approximation scheme for the conditional density of Xj^ 
under the point conditioning sequence (Si_„ ~ nUn) ■ In section 3, it is extended to the case when the conditioning 
family of events writes (Ui_„ = ■ The value of k for which this approximation is fair is discussed; an algorithm 
for the implementation of this rule is proposed. Algorithms for the simulation of random variables under the 
approximating scheme are also presented. Section 4 extends the results of Section 3 when conditioning on large 
sets. Two applications are presented in Section 5; the first one pertains to Rao Blackwellization of estimators, hence 
on the application of the results of Section 3 when the point condition is such that lim„^.oo ui,n/n- = Eu (X); in 
the second application the result of Section 4 is used to derive small variance estimators of rare event probabilities 
through Importance Sampling; in this case the conditioning event is in the range of the large deviation scale. 

The main steps of the proofs are in the core of the paper; some of the technicalities is left to the Appendix. 

2 Random walks conditioned on their sum 

2.1 Notation and hypothesis 

In this section the point conditioning event writes 

£n := (Si,„ = nan) ■ 
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We assume that X satisfies the Cramer condition, i.e. X has a finite moment generating function := 
EexpfK in a non void neighborhood of 0. Denote 



m{t) — log$(t) 



M3(t) j/it) 



The values of m{t), and ^z{t) are the expectation, the variance and the kurtosis of the tilted density 



71°' [X) 



expte 



p{x) 



(1) 



where t is the only solution of the equation m{t) = a when a belongs to the support of X. Conditions on which 
ensure existence and uniqueness of t are referred to as steepness properties; we refer to Barndorff-Nielsen [4] , p. 153 
and foUowings for all properties of moment generating functions used in this paper. Denote H" the probability 
measure with density tt". 

We also assume that the characteristic function of X is in U" for some r > 1 which is necessary for the Edgeworth 
expansions to be performed. 

The probability measure of the random vector X" on M" conditioned upon £„ is denoted Pna„ ■ We also denote 
Pna„ the corresponding distribution of conditioned upon £"„; the vector X^ then has a density with respect 
to the Lebesgue measure on K'^ for 1 < /s < ti , which will be denoted Pna^- For a generic r.v. Z with density 
p, p (Z = z) denotes the value of p at point z. Hence, Pna^ (xj^) = p (X^^ = .t5^|Si,,i = 7ia„). The normal density 
function on M with mean /i and variance r at a; is denoted n (/i, r, a;) . When /.j = and t = 1, the standard notation 
n (x) is used. 

2.2 A first approximation result 

We first put forwards a simple result which provides an approximation of the density Pna^ of the measure Pna^ on 
M*^ when k satisfies (Kl) and (K2) . For i < j denote 



Denote a := a„ omitting the index n for clearness. 

We make use of the following property which states the invariance of conditional densities under the tilting: For 
1 ^ * ^ J ^ '^i for all a in the range of X, for all u and s 



where Si. 



X, 



p ( Si.j = til Si,„ = s) = tt" ( S; J = u\ Si,„ = s) 
Xj together with Si.q = si.q = 0. By Baycs formula it holds 



Pna (^l) 



i=0 

TT (Xj+l = Xi+lj- 

=0 
'k-1 



(2) 



(3) 



na - si,j+i) 



TT" (Sj+i,„ = na - si^,) 
tt" {Sk+i.n = na - si^k) 



TT'^ (Si 



na) 



(4) 



Denote Sfe+i.„ and Si^„ the normalized versions of Sfe+i.„ and Si,„ under the sampling distribution H". By (4) 



Pna (■^l) 



■fe-1 



/n—k 



Tl'^ (Si 







A first order Edgeworth expansion is performed in both terms of the ratio in the above display; see Remark 5 
hereunder. This yields, assuming (Kl) and (K2). 
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Proposition 1 For all x\ in . 



Pna (xi) 



■fc-i 



1=0 

U 



ka—si_k 



H: 



6.s3(i«)Vn - k \s{t'^)Vn - k 



n(0) 

ka — si^k 



-k 



O 



(5) 



where H3{x) := — 'Sx. The value off^ is defined through m{t"') = a 



Despite its appealing aspect, (5) is of poor value for applications, since it does not yield an explicit way to 
simulate samples under a proxy of p„a for large values of k. The other way is to construct the approximation of 
Pna by steps, approximating the terms in (3) one by one and using the invariancc under the tilting at each step, 
which introduces a product of different tilted densities in (4). This method produces a valid approximation of pna 
on subsets of R'^ which bear the trajectories of the condition random walk with larger and larger probability, going 
to 1 as n tends to infinity. 

This introduces the main focus of this paper. 

2.3 A recursive approximation scheme 

We introduce a positive sequence e„ which satisfies 

oo (El) 
0. (E2) 

It will be shown that e„ (log n)^ is the rate of accuracy of the approximating scheme. 

We denote a the generic term of the convergent sequence (««)„>! • For clearness the dependence in n of all 
quantities involved in the coming development is omitted in the notation. 



lim e„v n — k = 



lim e„ (log n) 



2.3.1 Approximation of the density of the runs 

Define a density gnaiUi) on R*^ as follows. Set 

5o(yi|2/o) := 7r"(yi) 

with 2/0 arbitrary, and for 1 < i < k — \ define g{yi+i \ Hi) recursively. 
Set ti the unique solution of the equation 

nil := rn{ti) = (a - — ) (6) 

n — I \ n J 

where S\^i := yi + ... + yi. The tilted adaptive family of densities tt™' is the basic ingredient of the derivation of 
approximating scheme. Let 

_(log£;,™. expiX) (0) 

and 

^} — (logi;,™. exptX) (0) , j = 3,4 
which are the second, third and fourth cumulants of tt™' . Let 

9{yi+i\ yl) = Cipyi{yi+i)n {a/3 + a,/3,yj+i) (7) 

be a density where 



M3 

Si (n 

/? = (n - z - 1) (9) 



" = ^' + 2.f(n-»^l) 
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and Ci is a normalizing constant. 
Define 

fc-i 

9na{Vl) .9o(yi| 2/0)- n 9{V^+l\ V\) ■ (10) 

1=1 

We then have 

Theorem 2 Assume (Kl) and (K2) together with (El) and (E2). Let Yj" be a sample with density Pna- Then 

Pna (n") ■■= Pi^i = Yi\ Si.n - na) = .9na(n')(l + op„„ (e„ (logTi)')). (11) 

Proof. The proof uses Bayes formula to write p{^i = \ Si_„ ~ na) as a product of k conditional densities 
of individual terms of the trajectory evaluated at Y^. Each term of this product is approximated through an 
Edgcworth expansion which together with the properties of under Pna concludes the proof. This proof is rather 
long and we have differed its technical steps to the Appendix. 
Denote S'l^o = and Si^i := 5*1, i-i + Yi. It holds 



p(X^ = Y^^ \ Si,„ = ?ia) =p(Xi = Fil Si,„ = na) (12) 



fe-i 



np(X.+i =K,+i|Xl = l^',Si,„ =na) 

k-1 



i=0 



by independence of the r.v's X^s. 
Define ti through 



m{ti) 



Si. 



n — I \ n 
„2 ^2/ 



a hmction of the past r.v's Y^ and set := m{ti) and sf := s^(t,;). By (2) 

^(Xi+i = Ki+i| Si+i^„ = na - Sis) 
= it"'' (X,+i = y,+i| S^+i = 7ia - Si,,) 

TI""' (Si+2,n = na - Si,i+i) 



TT™' (Xj+l = Yi+l) ■ 



7r™» (Si+i^„ ^na- Si,. 



where we used the independence of the Xj's under tt'"'. A precise evaluation of the dominating terms in this latest 
expression is needed in order to handle the product (12). 

Under the sequence of densities tt™' the i.i.d. r.v's X^+i, X„ define a triangular array which satisfies a local 
central limit theorem, and an Edgeworth expansion. Under tt™', X^+i has expectation mi and variance sf. Center 
and normalize both the numerator and denominator in the fraction which appears in the last display. Denote 
TTn-i-i the density of the normalized sum (Si+2,ri — {n — i — l)^;) / [si\/n ~ i — 1) when the summands are i.i.d. 
with common density tt"'* . Accordingly 7r„„i is the density of the normalized sum (S^+i^n — (n — i)mi) / {si\/n — i) 
under i.i.d. tt"*' sampling. Hence, evaluating both 7r„_i„i and its normal approximation at point l^i+i, 

p(Xi+i = y^+il Sj+i,„ =na- Si,i) (13) 
Vn~i _^ , ■nn-i-i {{mi - Y,+i) /s^y/n - i - l) 



y/n-i-l 7r„-j(0) 



y/n — i ~ 1 Di 

The sequence of densities 7r„_i_i converges pointwise to the standard normal density under (El) which implies that 
n — i tends to infinity for all 1 < i < k, and an Edgeworth expansion to the order 5 is performed for the numerator 
and the denominator. The main arguments used in order to obtain the order of magnitude of the involved quantities 
are (i) a maximal inequality which controls the magnitude of mi for all i between and fc — 1 (Lemma 22), (ii) the 
order of the maximum of the F/s (Lemma 23). As proved in the Appendix, 



n (-Y,+i/s,Vn -i-l) .A.B + Op^^ ( ^ jjA (14) 

^ ^ \(n — i — \) ' I 
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where 



and 



^ - I ^--sf _ 15(m:0^ , OPn„((l°g")^) I ^^^^ 



2s*(ri— I— 1) ^ i-t-J-/ 

/3-' 

Bsj{n-i-l} 72s?(ra-i-l) 

The Op„^ ( - — . ,3/2 ) term in (14) is uniform upon {irii — Ki+i) j Si\/n — i — \. Turn back to (13) and perform the 
same Edgeworth expansion in the denominator, which writes 



The terms in g(Yij^\\Y^) follow from an expansion in the ratio of the two expressions (14) and (17) above. The 
gaussian contribution is explicit in (14) while the term exp i^^^q^^ZiZ^^i+i is the dominant term in B. Turning 
to (13) and comparing with (11) it appears that the normalizing factor Ci in g(Yi^\\Yl') compensates the term 
'^^P {^ 2a^ (^n-l-\) ^ I where the term $(ti) comes from tt'"' (X^+i = Yi+x) ■ Further the product of the 

remaining terms in the above approximations in (14) and (17) turn to build the 1 + op^^ (logn)^^ approximation 
rate, as claimed. Details arc differed to the Appendix. This yields 

fe-i 

p(Xf - Si,„ = no) = (1 + op_ (e„ (logn)')) .go(Yi| ^o) J] ^(^ml 

i=l 

which completes the proof of the Theorem. ■ 

That the variation distance between P„a„ and Gna^ tends to as ri — > cxj is stated in Section 3. 

Remark 3 When the ^i's are i.i.d. with a standard normal density, then the result in the above approximation 
Theorem holds with k = n — 1 stating that p{'X."~^ = x"~^ \ Si.„ = ?ia) = gna for all x^~^ in R"^^. This 

extends to the case when they have an infinitely divisible distribution. However formula (11) holds true without the 
error term only in the gaussian case. Similar exact formulas can he obtained for infinitely divisible distributions 
using (12) making no use of tilting. Such formula is used to produce Tables 1, 2, 3 and 4 in order to assess the 
validity of the selection rule for k in the exponential case. 

Remark 4 The density in (7) is a slight modification o/tt™'. The modification from tt™' (Vi+i) to g (yi+i\yl) is a 
small shift in the location parameter depending both on a and on the skewness of p, and a change in the variance 
: large values of X^+i have smaller weight for large i, so that the distribution of X^+i tends to concentrate around 
mi as i approaches k. 

Remark 5 In Theorem 2 , as in Proposition 1, as in Theorem 8 or as in Lemma 23, we use an Edgeworth expansion 
for the density of the normalized sum of the n — ith row of some triangular array of row-wise independent r.v's with 
common density. Consider the i.i.d. r.v's Xi,...,X„ with common density 7r"(a;) where a may depend on n but 
remains bounded. The Edgeworth expansion pertaining to the normalized density of Si^„ under tt" can be derived 
following closely the proof given for example in Feller [15], p. 532 and following s substituting the cumulants of p by 
those o/tt". Denote (pa{z) the characteristic function of'K°'{x). Clearly for any 5 >Q there exists qa.s < 1 such that 
\Lpa{z)\ < qa.s and since a is bounded, sup„ (/a,*" < 1. Therefore the inequality (2.5) in Feller [15] p. 533 holds. With 
ifin defined as in Feller [15], (2.6) holds with replaced by ipa and a by sit""); ('^■^) holds, which completes the proof 
of the Edgeworth expansion in the simple case. The proof goes in the same way for higher order expansions. 



2.3.2 Sampling under the approximation 

Applications of Theorem 2 in Importance Sampling procedures and in Statistics require a reverse result. So as- 
sume that is a random vector generated under Gna with density gna- Can we state that gna (^1^) *^ good 
approximation for pna (^1")? This holds true. We state a simple Lemma in this direction. 
Let 9{n and ©„ denote two p.m's on K" with respective densities r„ and s„. 
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Lemma 6 Suppose that for some sequence Sn which tends to as n tends to infinity 

rn = Sn (n") (1 + 0«„ (£„)) (18) 

as n tends to oo. Then 

Sn = r„ (yi") (1 + 06„ (en)) . (19) 

Proof. Denote 

^n,s„ {j/r : (1 - £n)Sn (y?) < (y^ < (y^ (1 + £„)} ■ 

It holds for all positive S 

lim 91„ (^n,5e„) = 1- 
n—)-oo 

Write 

5^n(^n,^eJ= / 1a„,..„ (y^ ^||yS„(yndyi • 

Since 

{AnJe„ ) < (1 + fe„)6„ (A„,5e„ ) 

it follows that 

lim 6„ iAn,5s,J = 1, 

which proves the claim. ■ 

As a direct by-product of Theorem 2 and Lemma 6 we obtain 

Theorem 7 Assume (Kl) and (K2) together with (El) and (E2). Let be a sample with density gna- It holds 

Pna {Y^) = 9na{Y^){l + og„„ (e„ i^ognf)). 

3 Random walks conditioned by a function of their summands 

This section extends the above results to the case when the conditioning event writes 

Ui,„ := ui.n (20) 

with 

Ui^„ := u (Xi) + ... + u (X„) 

where the function u is real valued and the sequence ui,„/n converges. The characteristic function of the random 
variable u (X) is assumed to belong to U for some r > 1. Let p\j denote the density of U = u (X) . 
Assume 

0u(t) := £'cxpiU < oo (21) 

for < in a non void neighborhood of 0. Define the functions m(t), s^it) and /i3(t) as the first, second and third 
derivatives of log 4>\j{t). 
Denote 

-&(-):= f^Pu(u) (22) 

with TO(i) = a and a belongs to the support of P\j, the distribution of U. 
We also introduce the family of densities 

, expiu(a;) , , 
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3.1 Approximation of the density of the runs 

Assume that the sequence e„ satisfies (El) and (E2). 
Define a density „(yi) on R*^ as follows. Set 

and 

5o(2/i|yo):=7rr(yi) (24) 
with 2/0 arbitrary and, for 1 < « < /s — 1, define g{yi^i \ yl) recursively. Denote ui^i := u(yi) + ... + u{yi) . 



Set ti the unique solution of the equation 



m{U) = "^-^ (25) 



sf := — (logE^^r.^ cxpiu) (0) 



and, let 
and 

^ (logi?^;^. cxptUj (0) , J = 3,4 
which are the second, third and fourth cumulants of tt^'. A density g{yi+i \ y\) is defined through 

5(yi+i| = Cipx(yi+i)n(a^ + mo, P,u {y,+i)) . (26) 

Here 



sf (n 

/3 = s? (n - i - 1) (28) 



" = ^'+ 2.f(n-.-l) 



and the Ci is a normalizing constant. 
Set 



fc-i 



5«i,„ (yi') := 5o(2/i| yo) n 9iy^+l\ vD- (29) 

4=1 

Theorem 8 Assume (Kl) and (K2) together with (El) and (E2). Then (i) 

Pu,^^ (n') P {^1 = Yl |Ui,„ - u,,,) = 5«:,„(n')(l + op„, „ (e„ (logn)')) 

OTirf (ii) 

Pu,^„ (n') - 5u,,„(^i'=)(l + OG„, „ (e„ (logn)^)). 

Proof. We only sketch the initial step of the proof of (i), which rapidly follows the same track as that in Theorem 
2 . 

As in the proof of Theorem 2 evaluate 

p(Xj+i = Fj+il Ui+i,„ = ui,„ - Ui^i) 

PU (Uj+2,n = Wl,n - 



^ Px (Xi+i = Ki+i) 



PU (Uj+i,„ = Ui^n - Ui^i 



PX (Xj+1 = Kj+i) pu (U!+2,n = Ul.n - t^l.j+l) 

-pv (Uj+1 = u(yj+i)) 



Pu (Ui+1 = u (K,+i)) pu (U.i+i,„ = ui.n - C/i,i) 

Use the invariance of the conditional density with respect to the change of sampling defined by TTy ' to obtain 

P(X, + i = Kj+il U, + i^„ = Ui^n - Uij) 

PX (Xj+i = Fi+i) TTu' (Ui+2,n = Ui,„ - t/l,4+l) 

- TTu (Uj+l = u (rj+i)) ■ 



e*'"^^'+^^^u'(U»+2.n-^ii,»-t/i,,+i) 



: PX (Xj+l = Fj+l) ■ 
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and proceed through the Edgeworth expansions in the above expression, foUowing verbatim the proof of Theorem 
2. We omit details. The proof of (ii) follows from Lemma 6 ■ 

We turn to a consequence of Theorem 8 

For all 6>0, let 



< S 



which by Theorem 8 satisfies 
It holds 



5«i.„ {Vi) 

lim P„,,„ {Ek,5) = lini G,,_„ iEt,s) = 1. (30) 



sup |P„, „ {C n Ek.s) - Gu, „ (C n Ek.s)\ 

CeB(K''-) 



< S sup / „ {y^) dy\ < S. 



By (30) 



and 



for some sequence ?/„ ; hence 



sup \Pu, „ (C n Ek,s) - Pm „ (C) I < rjn 

C6B(K'=) 



sup \Gu, „ (C n Ek,s) - Gu, „ (C)| < ?7„ 

CGe(K'=) 



sup „ (C) -G„,„(C) I < ,5 + 277„ 

C6e(K''-) 



for all positive (5. Applying Scheffe's Lemma, we have proved 

Theorem 9 Under the hypotheses of Theorem 8 the total variation distance between P^^ ^ and G^ „ goes to as 
n tends to infinity, and 

lim / |p«i.„ {vi) - 3«i,„ (vi) I dy'l = 0. 

Remark 10 This result is to be paralleled with Theorem 1.6 in Diaconis and Freedman [12] and Theorem 2.15 in 
Demho and Zeitouni [9] which provides a rate for this convergence for small k 's under some additional conditions 
on the moment generating function of U. 

3.1.1 Approximation under other sampling schemes 

In statistical applications the r.v.'s YiS in Theorems 2 and 8 may at time be sampled under some other distribution 

than Pna or Gna- 

Consider the following situation. 

The model consists in an exponential family V :— {Pe^-q, (^, ?/) G A/"} defined on M with canonical parametrization 
{9, rj) and sufficient statistics {t, u) defined on ]R through the densities 



Pe^n^x) := ^'^^ ' = exp [et{x) + r^u{x) - K{e, 77)] h{x). (31) 



d Pfl,,, { x) 
dx 

We assume that both and rj belong to R. The natural parameter space A/" is a convex set in defined as the 
domain of 

k{9, rj) := exp [K{0, rj)] = / exp [9t{x) + rju^x)] h{x)dx. 



For the statistician, 9 is the parameter of interest whereas 77 is a nuisance one. The unknown parameter 
generating the data set X" :— (Xi, ...,X„) observed as x" := (xi, ...,x„) is {9T,riT)- 

Conditioning on a sufficient statistics for the nuisance parameter produces a new exponential family which is 
free of rj. For any 9 denote fjg the MLE of rjT in model (31) parametrized in ?/. when 9 is fixed. A classical solution 
for the estimation of 9t consists in maximizing the likelihood 



L(0|X^) ■.= Y[pg^^,{^,) 
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upon 0. This approach produces satisfactory results when rjg is a consistent estimator of i^g . However for curved 
exponential families, it may happens that for some 9 the likelihood 

n 
i=l 

is multimodal with respect to 77 which may produce miscstimation in rjo, leading in turn unconsistcncy in the 
resulting estimates of 6'^, sec Sundbcrg [19]. 

Consider g^^^ „,(e,?)) defined through (29) for fixed (0, 77), with iti „ := it(xi) + ... + u(x„). Since Ui „ is sufficient 
for Tj , „.{e,ri} is independent upon rj for all k. Assume at present that the density g^^ „,(e,r)) on M*^ approximates 
Pui nX^'V) '^'^ sample x" generated under {OttTIt) ; it follows then that inserting any value ?/o in (29) docs not 
change the value of the resulting likelihood 

Lr)o (^'|xi) := g„j^^(e^^^)(xi). 

Optimizing (0| xj;') upon 9 produces a consistent estimator of 9^- We refer to [6] for examples and discussion. 

Let Y" be i.i.d. copies of Z with distribution Q and density (7; assume that Q satisfies the Cramer condition 
J (expte) q{x)dx < 00 for t in a non void neighborhood of 0. Let Vi^„ := u (Yi) + ... + u (Y„) and define 

(Vi) := 9 (Yi = 2/1 1 Vi,„ = ui,„) 

with distribution „ . It then holds 

Theorem 11 Assume (Kl) and (K2) together with (El) and (E2). Then, with the same hypotheses and notation 
as in Theorem 8, 

p(Xj = |Ui.„ = u,,,) = gu,AYi){^ + OQ^,J^n (logn)')). 
Also the total variation distance between „ and Pu^ ^ goes to as n tends to infinity. 

Proof. It is enough to check that Lemmas 21. 22 and 23 hold when Y satisfies the Cramer condition. ■ 

Remark 12 In the previous discussion Q = Pgj. X" are independent copies ofK with distribution Pg.ri„. 

3.2 How far is the approximation valid? 

This section provides a rule leading to an effective choice of the crucial parameter k in order to achieve a given 
accuracy bound for the relative error in Theorem 8 (ii). The accuracy of the approximation is measured through 

ERE{k) := „ Id. {Y^) ^""'^ ~.^C ^"^'^ (32) 



and 



VRE{k) Vara^^ „ Id. {Y,^) ^ ' ' ."^^f ^ ^ ^ (33) 
respectively the expectation and the variance of the relative error of the approximating scheme when evaluated on 

Ek ■■= {vi e M.'' such that \gu,^Ayi)/Pu,,r. {vi) -l\<Sn} 

with e„ (logn)^ /(5„ and (5„ 0; therefore Gm „ (-Dfc) ^ 1- The r.v's Y^' arc sampled under ^. Note that 
the density p„j ^ is usually unknown. The argument is somehow heuristic and informal; nevertheless the rule is 
simple to implement and provides good results. We assume that the set Dk can be substituted by M'^ in the above 
formulas, therefore assuming that the relative error has bounded variance, which would require quite a lot of work 
to be proved under appropriate conditions, but which seems to hold, at least in all cases considered by the authors. 
We keep the above notation omitting therefore any reference to Dk. 

Consider a two-sigma confidence bound for the relative accuracy for a given fc, defining 

CI{k) \ERE{k) - 2^/VRE{k), ERE{k) + 2^JVRE{k) 
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Let S denote an acceptance level for the relative accuracy. Accept k until S belongs to CI{k). For such k the 
relative accuracy is certified up to the level 5% roughly. 

The calculation of VRE{k) and ERE{k) should be done as follows. 
Write 



VRE{kf = Ep^ 



E, 



By Baycs formula 



fvk\ _ fvk\ "P (Ufc+i,n/ {n-k) = m{tk)) 



{n- k)p (Ui,„/n = wi,„/n) 
The following Lemma holds; see Jensen [16] and Richtcr [18]. 



(34) 



Lemma 13 Let Ui,...,U,i be i.i.d. random variables with common density pu on R and satisfying the Cramer 
conditions with m.g.f. 0u- Then with m{t) = u 

p (Ui,„/n = u) = , . r— (1 + o(l)) 

s[t)^/2■n 



when \u\ is bounded. 
Introduce 

and 



D 



Pu(too) 



N 



rrii, /■ \ 

TTu' [mk) 



[n-k) 



with rrik defined in (25) and toq — wi^„/n. Define t by m{t) — niQ. By (34) and Lemma 13 it holds 

rk^ D s{t) 



Pm.„ [yn = 

The approximation of A is obtained through Monte Carlo simulation. Define 

3 



Px (n' 



D J s^it) 



(35) 



and simulate L i.i.d. samples Y-l'{l) , each one made of k i.i.d. replications under px- Set 



1=1 



Wc use the same approximation for B. Define 

B{Yi') ■■= 

and 



Dj sit) 



(36) 
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with the same y/^(Z)'s as above. 
Set 



VRE{k) ■.= A- i^Bj (37) 

which is a fair approximation of VRE{k). 

The curve k — >■ ERE{k) is a proxy for (32) and is obtained through 



ERE{k) := 1 - S. 

A proxy of Clik) can now be defined through 



CI{k) 



ERE{k) - 2JVRE{k),ERE{k) + 2JVRE{k) 



(38) 



Wc now check the vahdity of the just above approximation, comparing CI{k) with CI{k) on a toy case. 

Consider u{x) = x. The case when px is a centered exponential distribution with variance 1 allows for an 
explicity evaluation of CI{k) making no use of Lemma 13. The conditional density p„a is calculated analytically, 
the density gna is obtained through (10), hence providing a benchmark for our proposal. The terms A and B are 
obtained by Monte Carlo simulation following the algorithm presented hereunder. Tables 1, 2 and 3, 4 show the 
increase in 5 w.r.t. k in the large deviation range, with a such that P (Si,„ > no) ~ 10~^. We have considered two 
cases, when n = 100 and when n = 1000. These tables show that the approximation scheme is quite accurate, since 
the relative error is fairly small even when approximating events is in high dimensional spaces. Also they show that 
ERE et CI provide good tools for the assessing the value of k. 



Figure 1: ERE{k){so\id line) along with upper and lower bound of C/(fc)(dotted line) as a function of k with 
n = 100 and a such that R, ~ 10"^. 




Figure 2: ERE{k){so\id line) along with upper and lower bound of C/(A:) (dotted line) as a function of k with 
n = 100 and a such that P„ ~ 10"^. 
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Figure 3: ERE{k){solid line) along with upper and lower bound of C/(fc) (dotted line) as a function of k with 
n = 1000 and a such that P„ ~ 10"^. 
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Figure 4: ERE{k){so\id line) along with upper and lower bound of C/(A:) (dotted line) as a function of k with 
n = 1000 and a such that P„ ~ 10"*. 



First, we present two algorithms (Algorithms 1 and 2) which produces the curve k — >■ CI{k). The resulting k ~ kg 
is the longest size of the runs which makes 17^1 „ a good proxy for ^■ 

The calculation of „ (yf ) above requires the value of 

Ci=\ / py^{x)n {a p + 771^,13, u{x))dx 



This can be done through Monte Carlo simulation. 

Remark 14 Solving U = m^^{rai) might be dijficult. It may happen that the reciprocal function of m is at hand, 
but even when px is the Weibull density and u{x) ~ x, such is not the case. We can replace step * by 

. _, {m{ti)+Ui) 
ti+i '■— ti 



{n - i) s2 {ti) ' 

Indeed since 

m{t,+i) - m{ti) = : {m{ti) + Ui) 

n — I 

use a first order approximation to derive that t^+i can be subtituted by Ti+i defined through 

1 

[n — i) s'^{ti) 

When lim„_>.oo Ui „/n = i?[u(X)], the values of the function s^(.) are close to yar[u (X)] and the above approxi- 
mation is fair. For the large deviation case, the same argument applies, since s'^{ti) keeps close to s'^{t"'). 



n+i := t^ - — ^.^ ^ ^ (m(ij) + Ui) . 
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Input : y'l, px, n, ui,„ 




Output : „ (y'l) 




Initialization: 






to ^ m ^ (mo); 






5o(:EikO)^ (24); 






Si ^ ■u(yi); 




Procedure : 






for i <~ 1 to fc — 1 do 






m, ^ (25); 

ti -(r- m~^{mi) *; 










a ^(27); 






/5^(28); 






Calculate C^; 






5(2/,+i| 2/1)^(26); 






end 






Compute „ (yf) ^(29); 




Return ■ 9ui,^{yi) 






Algorithm 1: Evaluation of 





Input : px, (5, n, ui^„, L 


Output : ks 


Initialization: fc = 1 


Procedure 






while S ^ C/(fc) do 






for ^ 1 to L do 






Simulate Yi{l) i.i.d. with density px; 






^ (yfc(Z)) :=(35) using Algorithm 1 ; 






B (Yi'^(/)) :=(36) using Algorithm 1 ; 






end 






Calculate CI{k) ^(38); 






k := k + 1; 






end 


Return : ks := k 



Algorithm 2: Calculation of ks 
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3.2.1 Simulation of typical paths of a random walk under a point conditioning 

By Theorem 8 (ii), gu-^ „ and the density of „ get closer and closer on a family of subsets of M.'^ which bear 
the typical paths of the random walk under the conditional density with probability going to 1 as n increases. By 
Lemma 6 large sets under „ are also large sets under It follows that longs runs of typical paths under 

p„j ^ can be simulated as typical paths under ^ defined in (29) at least for large n. 

The simulation of a sample with g^ „ '^^^ be fast as easy when lim„_i.oo wi.n/'^ = -^i''-'- 0^)]- Indeed the 
r.v. Xj;-(_i with density g (^Xi+i\x\) is obtained through a standard acceptance -rejection algorithm. The values 
of the parameters which appear in the gaussian component of g (^Xi+i\x\'j in (7) are easily calculated, and the 
dominating density can be chosen for all i as px- The constant in the acceptance rejection algorithm is then 
1/\/2tF0. This is in contrast with the case when the conditioning value is in the range of a large deviation event, 
i.e. lim„_>.oo ui^n/n ^ E[u (X)], which appears in a natural way in Importance sampling estimation for rare event 
probabilities; then MCMC techniques can be used. 

Denote 91 the c.d.f. of a normal variatc with parameter (/^, a ) ,and 91^1 its inverse. 



Input : p, fi, 

Output : Y 
Initialization: 

Select a density / on [0, 1] and a positive constant K 
such that p (^-^(x)) < Kf{x) for ah x in [0, 1] 
Procedure : while Z < p (91~^(X)) do 

Simulate X with density /; 

Simulate U uniform on [0, 1] independent of X] 

Compute Z := KUf{X); 
end 

Return ■.Y:=m-^{X) 

Algorithm 3: Simulation of Y with density proportional to p{x)n (/i,CT^,x) 



Input ; px, S, n, ui,„ 




Output : Y^^ 




Initialization: 






Set k -i— ks with Algorithm 2; 






to ?7i^^(too); 




Procedure : 






Simulate Yi with density (24); 






Si ^ u(ri); 






for z ^ 1 to fc — 1 do 






m, ^(25); 






ti m~^(TOi); 






Q^(27); 






P ^(28); 






Simulate li+i with density g{yi^i\ 


y\) using Algorithm 3; 




Sl+' ^ Sl+uiY,+i); 






end 




Return : Y^ 






Algorithm 4: Simulation of a 


sample Yi with density gu^ „ 



Remark 15 Simulation ofYi can be performed through the method suggested in [1] . 

Tables 5, 6, 7 and 8 present a number of simulations of random walks conditioned on their sum with n = 1000 
when u(x) = x. In the gaussian case, when the approximating scheme is known to be optimal up to /c = n — 1, the 
simulation is performed with k = 999 and two cases are considered: the moderate deviation case is supposed to be 
modeled when P(Si,,i > na) = 10^^ (Table 5); that this range of probability is in the "moderate deviation" range 
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is a commonly assessed statement among statisticians; the large deviation case pertains to P(Si^„ > na) ~ 10^^ 
(Table 6). The centered exponential case with n = 1000 and k = 900 is presented in Tables 7 and 8, under the 
same events. 




200 400 600 800 1000 200 400 600 800 1000 

Figure 5: Trajectories in the normal Figure 6: Trajectories in the normal 

case for P„ = 10^^ case for P„ = 10~^ 




200 400 600 800 



Figure 7: Trajectories in the expo- 
nential case for Pn = 10~^ 




200 400 600 800 



Figure 8: Trajectories in the expo- 
nential case for Pn = 10"^ 



In order to check the accuracy of the approximation, Tables 9, 10 (normal case, n=:1000, k=:999) and Tables 
11, 12 (centered exponential case, n=1000, k=900) present the histograms of the simulated X^s together with the 
tilted densities at point a which are known to be the limit density of Xi conditioned on f „ in the large deviation 
case, and to be equivalent to the same density in the moderate deviation case, as can be deduced from Ermakov 
[14]. The tilted density in the gaussian case is the normal with mean a and variance 1; in the centered exponential 
case the tilted density is an exponential density on (—1, oo) with parameter 1/(1-1- a). 



m-Mfl 









-3-2-10 1 2 



Figure 9: Histogram of the X.^s in the normal case with n — 1000 and k — 999 for P„ = 10 ^. The curve represents 
the associated tilted density. 
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Figure 10: Histogram of the X^s in the normal case with n = 1000 and k — 999 for Pn ~ 10 ^. The curve represents 
the associated tilted density. 
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Figure 11: Histogram of the X^s in the exponential case with n = 1000 and k = 800 for P„ = 10 ^. The curve 
represents the associated tilted density. 
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Figure 12: Histogram of the X^s in the exponential case with n = 1000 and k = 800 for P„ = 10 The curve 
represents the associated tilted density. 

Consider now the case when u{x) = x^. Table 13 presents the case when X is A^(0,1), n = 1000, fc = 800, 
P (Ui^„ ~ ui^n) — 10"^. We present the histograms of the X^'s together with the graph of the corresponding tilted 
density; when X is A^(0,1) then X^ is x^- It is well known that when ui.„/ri, is fixed larger than 1 then the 
limit distribution of Xi conditioned on (Ui^„ = ui.n) tends to N (0, a) which is the KuUback-Lcibler projection of 
iV(0, 1) on the set of all probability measures Q on R with J x^dQ(x) = a := lim„_>.oo u\,nln. This distribution is 
precisely 50(2/1 1 J/o) defined hercabove. Also consider (26); expansion using the definitions (27) and (28) prove that 
as n — >■ 00 the dominating term in gi{yi+i \yl) is precisely TV (0, toq) , and the terms including yf_^.i in the exponential 
stemming from n (a/3 + mo, /?, u(yi^i)) are of order 0(1/ {n ~ i)); the terms depending on yl are of smaller order. 
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The fit which is observed in Table 13 is in accordance with the above statement in the LDP range (when hm„_j.oo 
Ui^n/n ^ 1), and with the MDP approximation when hm„_j.oo = 1 and Uminf„_i.oo (wi.n — n) j \pn ^ 0, 

foUowing Ermakov [14] . 
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Figure 13: Histogram of the X^s in the normal case with n = 1000, k = 800 and u{x) = for P„ — 10 ^ . The 
curve represents the associated tilted density. 



4 Conditioning on large sets 

Approximation of the density 

PA„ (Xj = Y^) := p (X^ = Fi'^l Ui,„ e AO 

of the runs Xj^ under large sets (Ui,„ G An) for Borel sets An with non void interior follows from the above results 
through integration. Here, in the same vein as previously, is generated under Pa„ ■ An application of this result 
for the evaluation of rare event probabilities through Importance Sampling is briefly presented in the next section. 
The present section pertains to the large deviation case. 

4.1 Conditioning on a large set defined through the density of its dominating point 

We focus on cases when (Ui_„ e writes (Ui^„/n G A) where ^ is a fixed Borel set (independent on n) with 
essential infimum a larger than iJU and which can be described as a "thin" or "thick" Borel set according to its 
local density at point a. 

The starting point is the approximation of pnv on M'^ for large values of fc under the point condition 

Ui,„/n = V 

when V belongs to A. Denote gnv the corresponding approximation defined in (29). It holds 

Vr,A{x\) = / Vnv (Xj xX) p(Ui,„/n = v\ Ui,„ e nA)ds. (39) 

J A 

In contrast with the classical Importance Sampling approach for this problem we do not consider the dominating 
point approach but merely realize a sharp approximation of the integrand at any point of the domain A and consider 
the dominating contribution of all those distributions in the evaluation of the conditional density "PnA- A similar 
point of view has been considered in [3] for sharp approximations of Laplace type integrals in . 
Turning to (39) it appears that what is needed is a sharp approximation for 

p(U,„/n = .1 U,„ e uA) ^ r^^.f (40) 

P(Ui,„ e nA) 

with some uniformity for v in A. We will assume that A is bounded above in order to avoid further regularity 
assumptions on the distribution of U. 

Recall that the essential infimum essinfyl = a of the set A with respect to the Lebesgue measure is defined 
through 

a := inf {x : for all e > 0, | [a;, a; + e] n A| > 0} 



18 



with inf — oo. 

We assume that a > — oo, which amounts to say that we do not consider very thin sets (for example not 
Cantor- type sets). 

The density of the point a in A wiU not be measured in the ordinary way, through 

d{a) := hm ^ ' ^' " + ' 
but merely through the more appropriate quantity 

M{t) := t I e^^ydy , i > 0. 

J A-a 

For any set A, < M{t) < 1. If there exists an interval [a, a + e] C A then limj^^oo M{t) = 1. As an example, for a 

self similar set A := Ap defined through Ap := IJ p'Vp where p > 2 and Ip := [{p — 1) /p, 1] it holds =:essinf.4p 

nez 

and pAp = Ap. Consequently for any t > , M (tp) ~ M (t) and M{tp) = M (t) for alH > 0; it follows that 

inf M{u) = lim inf ]\I{t) < lim sup M{t) = sup M{u). 

1<«<P t^oo t^oo lu<p 



Define 



Mn{t) -.^ M{nt)/t^ e-^^dy 

J A-a 



and 

?ilog(/)u(0 + log Af„(t) - nat 

for alH > such that (f>ij{t) is finite. We borrow from [2] the following results. 

Define /z„(t) := (l/ri)log A/„(t) which is for all n > 1 a decreasing function of t on (0,oo) , and which is negative 
for large n. Also /ijj(t) — iJ,[(nt) and j./^ is non decreasing on (0, oo) . 

Let JI := limt^.oo A*i(0 ^-iid /i := limf_>o • Then [2] it holds 

Lemma 16 Under the above notation and hypotheses, the equation ^^^(t) = has a unique solution tn in (0,to) 
for a in (EXJ+JI, oo) where to := sup {t : (t>u(t) < oo} . Furthermore if a > EU + /i then there exists a compact set 
K C (0, to) such that tn & K for all n. 

Assume that a > E\J+^. Define ipn{t) '■= ^'„"(t) and suppose that for any A > 

V'n ( til + 



lim sup ^ — ; — - — 7 — = 1 (41) 

where tn solves ^'n{t) = in the range (0,io) ■ It can be proved that (41) holds, for example, when t — >• log M{t)/t 
is a regularly varying function at infinity with index p £ (0, 1) ,logM{t)/t £ TZp (oo); see [2], Lemma 2.2. 
We also assume 

lim sup t (log M{t)) " < oo (42) 

i— f oo 

which holds for example when log {M{t)/t) E TZp (oo) , for < p < 1. 

Theorem 2.1 in [2] provides a general result to be inserted in (40); we take the occasion to correct a misprint in 
this result. 

Theorem 17 Assume (4I) and (4-2) together with the aforementioned conditions on the r.v. U. Then for a > 
EV+^J, 

P(Ui.„ e nA) = '^u(^n)^-i..ft»)e-"*"" _^ ^^^^^ a. n ^ 00, (43) 
VVn (i«)V27r 

with tn satisfying '^'n{t) = provided that the function x — > P(Ui^„ G nA + x) is nonincreasing for n large enough. 
In particular, this last condition holds if 
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(i) (Petrov) : A = (a,oo) or A = [a, cx)); in this case M„ {t) = 1/t; note that in this case the classical result is 
slightly different, since 

P(Ui n>na)= ' ^ (1 + o(l)) asn-^oo 

t''s{t"-)V2^ 

with m{t°') = a and a > E\J; this is readily seen to be equivalent to (43) when A — (a,oo) . 

(ii) \J has a symmetric unimodal distribution 

(iii) \] has a strongly unimodal distribution. 

The shape of A near a is reflected in the behavior of the function M{t) for large values of t. As such, the larger 
n, the more relevant is the shape of A near a. 

Note further that Mn{t)e^^*'^ = J^e~"*^dy from which we see that a plays no role in (43). Hence a can 
be replaced by any number 7 such that /^_^ e~*ydy converges. Further i„ is independent on a. The so-called 
dominating point a oi A can therefore be defined through a := lim^^oo log/^ e~^ydy. 

In order to examine further the role played in (43) by the regularity of A near its essential infimum a introduce 
the pointwise Holder dimension of A at a as 

.(a):^^°^^(^) 



loge 
where 

G(£) |yl n [a, a + £]| for positive e. 

We refer to Proposition 2.1 in [2] for a set of Abel-Tauber type results which link the properties of M{t) at infinity 
with those of G at 0. For example it follows that G(e) - e''^") (as e ^ 0) iff M{t) - ct-^^^'+^r (1 + 5{a)) (as 
t — >■ 00). Consequently if M„(t) — 1 as t — )■ 00 then M{t) ~ t as t — )■ 00 and G'(e) ~ e as e — > 0. 

Asymptotic formulas for the numerator in (40) are well known and have a long history, going back to Richter 
[18]. It holds 

p(Ui,„/n ^v)= '^''Jl^^jp (1 + 0(1)) as n -> ex. (44) 

with t" defined through m{t'") = v. 

Pluging (44) and (43) in (39) provides an expression for the density of the runs. For applications the only 
relevant case is developed in the following paragraph. 

4.2 Conditioning on a thick set 

In the case when A = (a, 00) or with a > Eu (X) or, more generally, when A is a thick set in a neiborhood of its 
essential infimum (i.e. when limt_>oo = 1) a simple asymptotic evaluation for (40) when A is unbounded can 
be obtained. Indeed a development in the ratio yields 

p(Ui,„/n = v\ Ui,„ > no) = (nt exp -nt(iJ - a)) 1a(w)(1 + "(l)) (45) 

with m{t) = a, indicating that Ui^„/n is roughly exponentially distributed on A with expectation a + 1/nt. This 
result is used in Section 5 in order to derive estimators of some rare event probabilities through Importance sampling. 

In order to get a sharp approximation for pnA {^1 = Yi) it is necessary to introduce an interval (a, a + c„) 
which bears the principal part of the integral (39). 

Let Cn denote a positive sequence such that the following condition (C) holds 

lim ncn ~ 00 

n— >oo 
nCn 

sup- — < 00, 

n>i [n - k) 

and denote c the current term c„. 
Define on MJ' the density 

gnAiy'i) (46) 

nm^i (a) J^^" gnviVi) {exp-mn^^ (a) {v - a)) dv 
1 — exp —nm~^ (a) c 
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The density 

nm^^ (a) (exp-yim^^ (a) {v - a)) l{a,a+c)iv) 



(47) 



1 — exp —nm ^ (a) c 

which appears in (46) approximates p(Ui „/n — v\a < Ui „/n < a + c). Fm'thcrmorc due to Theorem 8 gnv{Yi) 
approximates Pnv(Yi) when resuhs from sampling under For a discussion on the maximal value of k for 
which a given relative accuracy is attained, see Broniatowski and Caron (2011) [5]. 

The variance function V of the distribution of U is defined on the span of U through 

V -)■ V{v) := s^(m"^(w)) 

Denote (V) the condition 

/•oo 

sup y/u' / V'{v) (exp — nm~^(a) {v — a)) dv < oo. 

n>l J a 

Theorem 18 Assume (E1),(E2), (C), (V). 
. Then for any positive 5 < 1 

PnA (Xj = Y^) = gnA{Y^){l + Op„^ (5„)) (48) 

and (ii) 

p„A (Xj = Y,^) = gnA{Y,^){l + OG„,, (Sn)) (49) 

where 

dn '■= max ^e„ (log 71)^ , (exp — ric)* j . (50) 

Proof. See Appendix. ■ 

Remark: Most distributions used in statistics satisfy (V); numerous papers have focused on the properties of 
variance functions and classification of distributions, see e.g. [17] and references therein. 

Corollary 19 Under the hypotheses of Theorem 18 the total variation distance between PnA and GnA goes to as 
n tends to infinity, i.e. 



lim / \pnA ivi) - gnA (vi) I dy'[ = 0. 



5 Applications 

5.1 Rao-Blackwellization of estimators 

This example illustrates the role of Theorem 8 in statistical inference; the conditioning event is local, in the range 
when lim„_>oo ui^n/n = Eu (X) . 

In statistics the following situation is often met. A model V consists in a family of densities pe where the 
parameter 9 is supposed to belong to R*^ and a sample of i.i.d. r.v.'s X" is observed, each of the X^'s having 
density where 9^ is unknown. A statistics n '■= u{Xi) + ... + u(A'„) is observed, which usually satsfies 
lim„^oo ui_n/n = Eu (X). A preliminary estimator 9 (X") is chosen, which may have the advantage of being easily 
computable, at the cost of having poor efficiency, approaching 9t loosely in terms of the MSE. The celebrated 
Rao-Blackwell Theorem asserts that the MSE of the conditional expectation of 0(X") given the observed value 
Ui^n of any statistics improves on the MSE of 6'(X") . When Ui.„ is sufficient for 9 the reduction is maximal, 
leading to the unbiased minimal variance estimator for 9t when 9 (X") is unbiased (Lehmann-SchefFc Theorem). 

The conditional density p„j ^ (x") := p (X" ~ x"| Ui^„ — wi^„) is usually unknown and Rao-Blackwellization 
of estimators cannot be performed in many cases. Simulations of long runs of length fc = fc„ under a proxy of 
Pui n (2^1) provide an easy way to improve the preliminary estimator, averaging values of 9 ((A'f) (0)i<;</^ where 
the samples {X\(]}f) 's are obtained under the approximation of „ ^^"^ ^ runs are performed. 

Consider the Gamma density 

fpfiix) := exp-x/e for x > 0. (51) 
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As p varies in M+ and 6 is positive, the density belongs to an exponential family ^r,e with parameters r := p — 1 
and 0, and sufficient statistics t{x) := log a: and u{x) := x respectively for r and 9. Given an i.i.d. sample 
Xi := {Xi, Xn) with density 7ry,eT the resulting sufficient statistics are respectively Ti^„ := logXi + ... + logX„ 
and := Xi + ... + Xn. We consider the parametic model {'^rTfi,0 > 0) assuming rr known. 

Since Ui,„ is sufficient for the parameter 9 in gu^ ^ it can be used in order to obtain improved estimators of 9t 
through Rao Blackwellization. 

The parameter to be estimated is 0^- A first unbiased estimator is chosen as 

^ X,+X2 
"2 •= — • 

Given an i.i.d. sample X" with density jrrfiT the Rao-Blackwellised estimator of 9 is defined through 

9rb,2 :~ E (^92 Ui^n^ 

whose variance is less than Var92- 

Consider k = 2 in gui^^iUi) ^-^id (^1,^2) be distributed according to gui^niUi)- Replications of {Yi,Y2) induce 
an estimator of 9b.b.2 for fixed J7i,n- Iterating on the simulation of the runs X" produces for n = 100 an i.i.d. 
sample of 9rb,2'^ arid Var9jiB,2 is estimated. The resulting variance shows a net improvement with respect to the 
estimated variance of 02- It is of some interest to confront this gain in variance as the number of terms involved 
in 9k increases together with k. As k approaches n the variance of 9k approaches the Cramer Rao bound. The 
graph below shows the decay of the variance of 9^. We note that whatever the value of k the estimated value of 
the variance of 9RB,k is constant, and is quite close to the Cramer Rao bound. This is indeed an illustration of 
Lehmann-Scheffe's theorem. 



Figure 14: Variance of 9k, the initial estimator (dotted line), along with the variance of 9iiB,k, the Rao-Blackwellised 
estimator (solid line) with n = 100 as a function ok k. 



5.2 Importance Sampling for rare event probabilities 

Here we consider the application of the approximating scheme under a conditioning event defined through a large 
set. Also this event is on the large deviation scale. A development of the present section is presented in [5]. Consider 
the estimation of the large deviation probability for the mean of n i.i.d. r.v.'s u(Xi) satisfying the conditions of 
this paper. This is a benchmark problem in the study of rare events; we refer to the book by Bucklcw [7] for the 
background of this section. 

Let ui^n '■= na for fixed a larger than Eu(X.). The probability to be estimated is 

P„ := P (Ui,„ > Ui^n) ■ 

The Importance Sampling procedure substitutes the empirical estimator 

_ 1 ^ 

Pn ■■ = -^l(Ui,„(Z) >ui,0 
1=1 

= ;^El (E"(^«W)>«i,n) (52) 
1=1 \i=i ) 
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by 



1 ^ 



1=1 



p{u (Xi(0))...p(u(x„(0)) 

5(h(Xi(0)..^(X„(0)) 



iK]^.(x,(0)> 



(53) 



In the above display (52) the sample X"(Z) is generated under the i.i.d. sampling with distribution Px and the L 
samples are i.i.d. In display (53) the sample X"(Z) is generated under the density g on ffi" (under which the X^'s 
may not be independent). The L samples X" (l) are i.i.d. 
It is well known that the optimal sampling density is 

Popt (a;'i ) p{^" = -t'i I Ui,„ > ui,„) 

which is not achievable, presuming known P„ . This optimal sampling density produces the zero variance estimator 
P„ itself with L = 1. However approximating popt (x") sharply at least on the first k coordinates for large k produces 
a large hit rate for the Importance sampling procedure, and pushes the Importance factor towards 1. 
Define the sampling density g on M" through 



9{xi) ■■=9nA{x'^) n 



i=k+l 

where gnA is defined in (46) and tt" is the density defined in (23). The approximating density gnA has been used 
to simulate the k first X.^'s and the remaining n — k ones are i.i.d. with the classical tilted density. The classical IS 
scheme coincides with the present one with the difference that k = 1 and gA„{xi) = 7r,„ (a;i) hence simulating under 
an i.i.d. sampling scheme with common density tt". 

Simulation under gnA is performed through a double step procedure: In the first step, randomize the value of 
Ui,„/?i on (a, +oo) according to a proxy of its distribution conditioned on XJi^n > na; hence simulate a random 
variable S on (a, +oo) with density 



psis):^nm ^(a„)(exp-nm ^ (a) (s - a)) 1(q_+oo) (s)- 



(54) 



Then plug in nS in lieu of Mi.„ in (29) and iterate. This amounts to consider each point in the target set as a 
dominating point, weighted by its conditional density under (Ui^„ > na) . Simulation of S under (54) instead of 
(47) is slightly suboptimal but much simpler. It can be proved that the MSE of the estimate of P„ in this new IS 
sampling scheme is reduced by a factor y^{n — k) /n with respect to the classical scheme when calculated on large 
subsets of M'^; see [5]. Figure 15 shows, in a simple case, the ratio of the empirical value of the MSE of the adaptive 
estimate w.r.t. the empirical MSE of the i.i.d. twisted one, in the exponential case with P„ = 10^^ and n = 100. 
The value of k is growing from fc = (i.i.d. twisted sample) to fc = 70 (according to the rule presented in [5]). This 
ratio stabilizes to yjn — k/ ^Jn for L = 2000. The abscissa is k and the solid line is fc — \Jn~ kj ~Jn. 



Figure 15: Ratio of the empirical value of the MSE of the adaptive estimate w.r.t. the empirical MSE of the i.i.d. 
twisted one (dotted line) along with the true value of this ratio (solid line) as a function of fc. 



Remark 20 In the present context Dupuis and Wang [13] have shown that i.i.d. sampling schemes can produce 
"rogue paths " which may alter the properties of the estimate, and the estimation of its variance. They consider an 
i.i.d. random sample X" where Xi has a normal distribution iV(l, 1) and 



_ XI + ... + Xn ^ ^ 
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where A = (—00, a) [J [h, +00) with a < 1 < b. The quantity to be estimated is P ■ 

Assuming a + b < 2, the standard i.i.d. IS scheme introduces the dominating point b and the family of i.i.d. 
tilted r.v's with common N(b,l) distribution. "Rogue paths" generated under N{b,l) may hit the set (—00, a) with 
small probability under the sampling scheme, henceforth producing a very large Importance Factor. The resulting 
variance of the estimate is very sensitive with respect to these values, as exemplified in their Table 1 p2^. Simulation 
of paths according to GnS with S defined in (54) produces by their very construction samples which yield both a hit 
rate close to 100% and an Importance Factor close to P {£„) ■ We refer to [5] for discussion and examples. We also 
quote that Dupuis and Wang [13] propose an adaptive tilting scheme, based on the product of the tt'"' , 
which yields an efficient IS algorithm. 

6 Appendix 

For clearness the current term a„ is denoted a in all proofs. 

6.1 Three Lemmas pertaining to the partial sum under its final value 

We state three lemmas which describe some functions of the random vector X" conditioned on £„. The r.v. X is 
assumed to have expectation and variance 1. 



Lemma 21 It holds Ep^^ (Xi) ^ a, Ep,^^ (X1X2) = + (i) , Ep,^^ (Xf) = s'^{t) +0^+0 (i) where m{t) = a. 



Proof. Using 




PS2.„ {na - x) pxi {x) _ ^i,,,. (na - x) 7r° ^ (z) 
PSi,„ [na) 7r|^^ {na) 



normalizing both 7r|^ (na — x) and TTg^ (na) and making use of a first order Edgeworth expansion in those 

expressions yields -Ep„„ (Xf) = s^(t) + a^ + (i) . A similar development for the joint density p„a(Xi = x,X.2 = y), 
with the same tilted distribution tt" produces the limit expression of Ep^^ (X1X2). ■ 



Lemma 22 Assume (El). Then (i) maxi<i</£ |TOi| = a + op^^ (e„) . Also (ii) maxi<i<fcsf, maxi<i</c /ig and 
maxi<i<fe fi\ tend in Pna probability to the variance, skewness and kurtosis of tt— where a '. — lim^_^oo a^. 



Proof, (i) Define 



Vi+i := m(ti) - a 



— a. 



n — i 



We state that 



max l^i+il =op„„ (cn) , 



(55) 



namely for all positive 5 



n 



lim P. 



-oo 








which we obtain following the proof of Kolmogorov maximal inequality. Define 



Ai := ((\Vi+i \ > 6en) and (\Vj\ < <5e„ for all j < i + 1)) . 



from which 
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It holds 



> 



[Vl +2{Vk~ Vi) Vi) dP,,a + I [V^ + 2 (Ffc - V,) V,) dP^a 



> / V^dPna 
JuAi 

= 5'^e\Pna[ max >(5e, 

\ 0<?</c— 1 

The third hne above follows from EVi {Vk — Vi) ~ which is proved hereunder. Hence 

f max >5e^ < ^""^^J^^^ - ^^_^(l + o(l)) 

where we used Lemma 21; therefore (55) holds under (El). Direct calculation yields £'p„„ {Vi{Vk — Vi)) = 0, which 
achieves the proof of (i). 

(ii) follows from (i) since lim„_i.oo maxi<i</c miti) = a. m 

Wc also need the order of magnitude of max(|Xi|, |Xfc|) under Pna which is stated in the following result. 
Lemma 23 It holds max (|Xi| , |X„|) = Op„^ (logn) . 

Proof. Set |Xi| := X^ + X^ with X,^ := — min (0, X^), X^ max (0, X^); it is enough to prove that max^ X~ = 
OPna (log'^) E^nd maxi X^ = Op^^ (logn) . Since .EexpiX is finite in a non void neighborhood of so are E'cxpiX" 
and _BexpiX+. We hence prove the Lemma for positive r.v's X^ 's only. 
Denote a the current term of the sequence a. For all t it holds 



Pna 

(max(Xi, ...,X„) > t) < nPna (X„ > t) 

7r''(Si.„_i = na — u) 



7. ("""^"^^^ 



du. 



Let T be such that m(T) — a. Denote s := s{t). Center and normalize both Si.„ and Si_„_iwitli respect to the 
density tt" in the last line above, denoting tt" the density of Si.„ := (Si^„ — na) /s-y/n when X has density 7r° with 
mean a and variance s^, we get 



P„,(max(Xi,...,X„) >t) <n-^= / (X„ = ./) 



7r°_i (Si,„_i = [na - u - {n - l)a) / {s^n - l)) 



du. 



K (Si,„ = 0) 

Under the sequence of densities tt" the triangular array (Xi, ...,X„) obeys a first order Edgeworth expansion 

/n f°° 

Pn 

Vn-l Jt 
n ((a — u) I s\Jn — l) P (u, i, 7i) + o(l) 



n(0) + o(l) 

tt" (X„ u) du. 



-du 



for some constant Cst independent of n and r and 

P (m, i, n) := 1 + P3 ((a - u) /s^/n - l) 



25 



where P3 {x) = ^ (x^ — 3x) is the third Hermite polynomial; and /is are the second and third centered moments 
of 7r°. We have used the fact that the sequence a converges to bound all moments of the tilted densities tt". We 
used uniformity upon u in the remaining term of the Edgeworth expansions. Making use of Chernoff Inequality to 
bound n° (X„ > t) , 

P„„ (max(Xi,...,X„) >t)< nCst^^i^e-^* 
for any A such that 4>{t + A) is finite. For t such that 

1 1 log n — > 00 

it holds 

P„a (max(Xi, ...,X„) < ^ 1, 

which proves the lemma. ■ 



6.2 Proof of the approximations resulting from Edgeworth expansions in Theorem 1 

We complete the calculation leading to (15) and (16). 



Set Zi+i := {rrii - Yt+\) /siy/n - i - 1. 
It then holds 



1 + 7ra^3(^«+l) + ^,P4{Z..+l) 



(56) 



3/2 



{n-i-1) 

We perform an expansion in n(Zi_|_i) up to the order 3, with a first order term n(— {si^Jn — i — l)) , 
namely 



xi{Zi+i) = n i^-Yi+i/ {si\/n - i - 1^^ 



(57) 



1 



s'f (n-i-1) ^ 2^ 



n(: 



(n-i-1) 

Y* 



1 



where Y* = — / . , (-Y,+i + OmA with l^l < 1. 

Lemmas 22 and 23 provide the orders of magnitude of the random terms in the above displays when sampling 
under Pna- 

Use those lemmas to obtain 

(a + op„„ (£«)) (58) 



and 



s\(n — i —X) n — i — I 

rrif 1 , , . ^^2 

(n — 2 — 1) n — I — 1 



Also when (A) holds then the dominant terms in the bracket in (57) are precisely those in the two displays just 
above. This yields 



n(Zj+i) = n 



i\/n — i — 1 



1 



s'f{n-i-l) 2s'f{n-i-l) 
I °P,ia log ") 



We now need a precise evaluation of the terms in the Hermite polynomials in (56). This is achieved using Lemmas 
22 and 23 which provide uniformity upon i between 1 and k = k„ in all terms depending on the sample path Yi- 
The Hermite polynomials depend upon the moments of the underlying density tt™' . Since tt™' has expectation 
and variance 1 the terms corresponding to Pi and P2 vanish. Up to the order 4 the polynomials write ^3(2;) = 
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^Hsix), Pi{x) = ^0^Hq{x) + Hi{x) with if3(x) -t^ - 3a:, Hi{x) a:^ + Gx^ - 3 and i76(a:) := 

- ISa;"* + 45^2 - 15. 

Using Lemma 22 it appears that the terms in , j > 3 in P3 and P4 wiU play no role in the asymptotic behavior 
in (56) with respect to the constant term in P4 and the term in x from P3 . Indeed substituting x by Zi+i and 
dividing by 71 — i — 1, the term in x"^ in P4 writes Op,^^ (logn)^ /{n — i)^ where we used Lemma 22. These terms 
are of smaller order than the term — 3.t in P3 which writes — 2s^(n-i-i) ^ ^i+i) = n-\-i ^Pna i^ogn) . 



It holds 



V?i - i - 1 _ i _ 1) 



6(s,)6(^_j_l)2 



which yields 



- {a-Y,+i) + - ■ ^Op„^(logn) . (59) 



^n-i-l 2sj{n~i-l)' [n-i-l) 
For the term of order 4 it holds 



which yields 

Pa{Z,+i) ^ Ml - H 15(/^|)^ Op„„ ((log»)') 

n-z-1 84(n-i-l) 724(n-i-l) [n-i-\f 

The fifth term in the expansion plays no role in the asymptotic. 

To sum up and comparing the remainder terms in (59) and (60), we get 



(60) 



tt;:^{Z^+i) = n - i- 1)) .A.B + Op„ 

where A and B are given in (15) and (16). 



3/2 



6.3 Final step of the proof of Theorem 1 

We make use of the following version of the law of large numbers for triangular arrays (see [20] Theorem 3.1.3). 

Theorem 24 Let Xi_n jl < i < ^' denote an array of row-wise real exchangeable r.v's and lim„_>.oo = 00. Let 
Pn EXi_nX2,n- Assuuie that for some finite T , -EX^ „ < F. If for some doubly indexed sequence (ai^n) such that 

lim„_^oo SiLi ^i,n = *^ holds 

lim Pn V a = 



then 

in probability. 
Denote 



lim > aj_„Xi.„ 

n— >oo — ^ 



1=1 



^^ - 4 , 15(/i^)' 



2sf ' 8^4 -r 7256 



Pi Ki + , '^l ^ ^7~2- 
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By (13), (14) and (17) 

p(Xi+i = Y.i+i\St+i^n = na - Si^i) 



zTT (Xj+l = Yi+l) 7— A(l) 



-Yi 



+ 1 



n(0) 
with 

A(i) := 



1 _ ^ + O 



We perform a second order expansion in both the numerator and the denominator of the above expression, which 
yields 



A(^) ^ cxp ( ^ , - + -_3,Aen]2iJ]l\ (g^^ 

^' 2sf(n-j-l) n-i-1 n-i-1 ' ^' ^ ^ 



The term A'{i) in (61) writes 



The term exp (^^^^ + 2s'HnU-i) ) (6^) captured in g(r^+i| F/; 

^'(z) := QlQl 



with 

('^2)'' 4- 1 f M^yj + i _ aM2 ^2 ^ ^ 



■ exp ^ ^ (n— t-l)(n— 1) + 2(n— t)^ ' 2 \^ n-i-1 n-i-1 n— t-1 

and 

expi?2 

where 



(n — z — 1)^ (n — 2 — ly 



/ , \ OP„ (log"-)) ^ / ?N 

-op„„ (£„ log n)+ ° ■'^ +o(it^) 



{n — I — 1)^ {n — I — 1) 



n — z \(n — i)-''^/ \(n — j)" 



with 

Ml 

We first prove that 



(n — i)-^/^/ y — z "° \(n — z)'^/^^ 

/iiyj+i M2^^ 1^2 0P„a (gn log n) 

n — i — 1 n — i — I n — i — I n — i — 1 

k-l 



Jl A'(z) = l + op_(e„(logn)') (62) 



as n tends to infinity. 
Since 

fe-l k-l k-l 

p(Xj = Fi'^ISIVi - na) ^ go {Y^\ Yo) Y[ g {Y,+i\Y^) [] ^'(0 H 

i=0 i=0 i=0 

where 



C,- \/n — i ( an] 



$ (ti) V« - » - 1 V " ^ « ^ 1 
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the completion of the proof will follow from 

fe-i 



Jl Li = 1 + op„^(e„ (logn)^). 



(63) 



1=0 



The proof of (62) is achieved in two steps. 



Claim 25 nto Q\ = ^ + op„„(en (log")'). 

By Lemma 22 the random terms deriving from tt™* satisfy 

maxi<t<k IfJ-] ~ fJ-] \ = op„„(l) 

as n tends to oo, where fXj is the j-th cumulants of tt- where a:= lim„_^oo a is finite. Therefore we may substitute 
/X* by Hj in order to check the convergence of all subsequent series. 
Developing Ql define, for any positive /3i, ^2, /^s and /34 



1 



e„(logn) 



fe-i 

E 



k-l 



e„(lOg7l) 



(n — i — l)(n — i) 



</?i 



fc-i 



e„(lOg7l) 



(n - i - 1)2 



(/4a)2 



and 



fc-i 



At := 



e„(logn) 



(n-i - 1)2 



It clearly holds that 
Let for any positive (3r^ 



(n — i — 1)2 
lim P„, (A^,) = 1; j = l,...,4 



</32j, 
</33| 

</34i. 



fc-i 



A^ 



e„(lOg?l) ,^^g 



„i V 



(n - i - 1)2 

If fim„_yoo Pna (Al) =■ 1, then lim„^oo Pna {M^ , j = 6, 7 where 



1 



[e„(logn) 
e„(logn)^ 



fe-i 

E 

fc-1 

E 



i+1 



(n — i — 1)2 



(n - z - 1)2 



</36 



</37 



Apply Theorem 24 with Xj_„ = F^+i and a,.„ = e^(\o^n)\n^i-\Y ' Lemma 21 



Hence L'p^^ [Vi^] < F for some finite F. Further p„ 
Indeed 



O (i) . Both conditions in Theorem 24 are fuUfilled. 



lim > a^j j = lim 

rj, — ' ^ ' rj, — ^oo 



(logTi) (n — k)^ 
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which holds under (El), as holds 

lim pn a„.i = lim 

n— fCJO \ ^ — ^ / n— >-c< 

\i=l / 

Therefore, for i ~ 5, 6, 7 

lim P„a (Al^) = 1. 

n— >-oo 

Define for any positive /^g 



(log"-)'' - 



Apply Theorem 24 with X,,„ = g^^^^ ^^^^ ^ e„(iogn)^\«-»-i)2 
It holds 



lim V al ^ = 



n— foo 



when (El) holds. 
By Lemma 21, 

which entails that such that EY^ < F < oo for some F. Also 

Ep„^ {yM) = (s'(0) + a) {s\Q) + a)+0 

and 

2 



■'"(e„(logn)^g(n-'-l)^) 



^ lim p„ I ^ > : 7TTT I = 

under (El). Hence 

lim P„, {Al) = 1. 

n— ^oo 

It follows that, noting An the intersection of the events , j = 1, ...,8 

lim Pna {An) = 1. 
n— ^oo 

To sum up, we have proved that, under (El), 

Ql = 1 + op„„ [tn (logn) 

Claim 26 Ilto Q2 = 1 + op„„ (e« (log") 

This amounts to prove that the sum of the terms in B\ (resp in i?2) is of order op,^^ |^e„ (logn) 

The four terms in the the sum of the terms in B\ arc respectively of order op^^ (e^(logn)^) j{n~k\ op^^ (e„(logn)'^) 
k), op^^ (ae„(logn)^) / (n — k) and op^^ (e„(log72)^) / {n k) using Lemma 22. The sum of the terms o (wf) is of 

order less than those ones. Assuming (El) all those terms are op„^ ^e„ (logn)^^ . 

For the sum of terms B2, by uniformity of the Edgeworth expansion with respect to Yi it holds X!i=i = 

Op„„ ((" - k)-^^^) = op„^ (e„ (logn)') by (El). 

Wc now turn to the proof of (63) 
Define 

4 , - 0-? 



2sf(n-i-l) 2si{n-i-l)' 
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Use the classical bounds 



1-U + ^<e"'"<l--u+ — 

2 6 - - 2 



to obtain on both sides of the above inequalities the second order approximation of ^ through integration with 
respect to p. The upper bound yields 



71 — i — 1 



from which 



1 



n-i- 1 



^/n-i f aK\ \ 1 , 

U < I . = cxp 



V?i - I - I \ n - I - I I \ — ' „ ^' h Cp 7 ^— rv7 



where the approximation term is uniform on the . 



Substituting and exp y— by their expansion 1 + 2{n-i-i) + ^ („_.i_i)ii j and 1 

(^nTi-iyi + O ^ (,„_°_i):i ^ in the upper bound of Li above yields 

1 aK\ {o-iAY ( 1 



2(n-i-l) n-i-\ 2(?i - i - 1)2 V(n-i-l) 

^ - ?: - 1 

Using Lemma 22, TTif — 2a?7ii + = op^^(ae„) and therefore 



y _|_ '"1""' _ "t ' ■■"I ' " ^ Q 

n — i — 1 2s|(n — i — 1) \^ (71 — z — 1)^ 



i,: < 1 + : 7T + . ^ / + o 



1 



2(n — j — 1) n — i — I {n — i — 1)^ y (71 — i — 1)^ 



Write 



with 



77 — 7 — 1 2(n — 7 — 1) 77 — 7 — 1 

Y[l,<Y[{i + m,) 

i=l i=l 

(aK*i)2 op„^(ae„) 



(n — i — 1)2 71-7-1 



Under (El), Ei=o 

is op„^ (^e„ (logri)^^ . This closes the proof of the Theorem. 
6.4 Proof of Theorem 18 

The following lemma (see [16], Corollary 6.4.1) provides an asymptotic formula for the tail probability of Ui_„ 
under the hypotheses and notation of section 3. Define 

I-u{x) := xm,~^ (x) — \og(f>u (m~^ (x)) 
Lemma 27 Under the same hypotheses as above 



n J V27r^-0(a~ 



where ipia) := t°'s{t'^). 
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Lemma 28 Suppose that (V) holds. Then (i)Ep^^\Ji = a + o(l), (ii) Ep^^\J\ = 1 + (<) + o(l) and (in) 
Ep^^j^\5i\52 = a? + 0(1) where m{t) = a. 

Proof. It holds 

/>oo 

Ep^^Vi = / [Ep^^Vi) p {IJi^n/n ^ v\lJiji > na)dv. 

J a 

Integration by parts yields 

/>oo 

Ep^^\Ji=a+ / P {Uin/n > v\\Jin > na) dv. 

J a 

Using Lemma 27 and Chernoff inequality, 

/•OO 

P (Vi^n/n > v\\Ji^n > na) dv < V2T:')p{a)\/n / exp[n {lu (a) — lu {v))]dv 



f P(Ui.„/n > t;|Ui.„ > na)dv < 



where ipio.) = (t) . 

Finally, using Iij{v) > Ijj{a)v + Iu{a) — aljj{a), and integrating 



Hence, Ep^^^lJi = a + o(l). 

Insert Ep^^Vf = + s^j {t) + O (i) in 

Ep^A^l = / (-Ep„„U?)p(Ui,„/7i = i;|Ui,„ > na)dv. 

J a 

Firstly, through integration by parts. Lemma 13 and Chernoff inequality, 

/>oo 

/ v'^p {XJi^n/n — v\ Ui_„ > na) dv ~ + 

J a 

Secondly 

V{v)p ( JJi^n/n — v\ Ui^n > na) dv = 



s'^{t) + 2 / v'{v)P{Vi^n/n>v\Vi,n>na)dv 

J a 

which tends to s^(i) as n — > 00 using again Chernoff Inequality, condition (V) and Lemma 13. 

The third term is handled similarly due to the fact that the 0(l/n) consists in a sum of powers of v. 
The proof of (iii) is similar as the above ones. ■ 

Lemma 28 yields the maximal inequality stated in Lemma 22 under the condition (Ui.„ > na) . We also need 
the order of magnitude of the maximum of (|Ui| , |Ufe|) under PnA which is stated in the following result. 

Lemma 29 It holds 

max(|Ui|,...,|U„|) = Op„^(logn). 

Proof. Using the same argument as in Lemma 23 we consider the case when the r.v's XJi take non negative 
values. We prove that 

lim P„A (max(Ui, ...,U„) > t„) = 

n— f 00 

when 

T . tn 

lim — 00. 

n-i-oo log n 

For fixed d it holds 

PnA (max(Ui, ...,U^) > tn) = / P(max(Ui, ...,Un) > tn| ^ v) 

J a 



p{^i,n/n = v\ Ui,n/n > a) dv 

+ / P(max(Ui,...,U„) >t„|Ui,„/n = w) 

Ja+d 



3+d 

P (Ui,„/7i = v\ Vi^n/n > a) dv 
= :/ + //. 
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Now 

P{Vi,n/n >a + d) 
- P(Ui,„/n>a) 

which tends to by Lemma 27. 

Furthermore by Lemma 23, hm„_i.oo ^' (max (Ui, U„) > t„| Ui,„/n = w) =: hm„_i.oo '"n = when v € 
(a, a + d) . Hence 

/<r„(l + o(l))->0. 

This proves the Lemma. ■ 
We now prove (48). 

Step l.We first prove that the integral (39) can be reduced to its principal part, namely that 

p„^(yi^-) = (i + op„,(i)) 

/ ^ p(Xj = Yi^'l Vun/n = «)p(Ui,„/n = v\ Ui,„ > na)dv (64) 

J a 

holds for any fixed c > 0. 

Apply Bayes formula to obtain 



{n-k) 



— k n—k 



t - !^ ] ] dt 



P(Ui,„ > na) 



where Ui.k ■= ^ 



Denote 



with 



kUi, 



Then (64) holds whenever / (under P„a)- 
Under P„yi it holds 



Ui.n = a + Op„ 4 



A similar result as Lemma 22 holds under condition (Ui^„ > na), using Lemma 28; namely it holds 



max K7i+i,„ = a + op„^ (e„) . 

Using both results 

mfc = a + Op„^ (w„) (65) 

with Vn = max ^e„, (j^_k)ln-'^(a) ) ^^^i^h tends to . 

Wc now prove that / — > 0. Using once more Lemma 27 yields 



J. _ m ^{mk)s[m ^ (rrife)) 



0) 



exp - (n - fc) ( lu ( mfc + ^ _ ^ ) - Iv {nik) 



Now by convexity of the function lu 



exp -{n- k) [ lu [ nik + ^ ) - /u {mk] 



< exp—ncm ^{mk) = exp—nc 



V(a + 90p^^[Vn)) 
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for some 9 in (0, 1) . Therefore the upper bound hereabove tends to under PnA when (C) holds. By monotonicity 
of t — m{t) and condition (C) the ratio in / is bounded. 
We have proved that 

/ = Op„^ (exp —nc) . 

Step 2. We claim that (48) holds uniformly in v in (a,a + c) when Yi is generated under PnA- This result 
follows from a similar argument as used in Theorem 8 where (48) is proved under the local sampling Pnv A close 
look at the proof shows that (48) holds whenever Lemmas 22 and 23, stated for the variables U^'s instead of X^'s 
hold under PnA- Those lemmas are substituted by Lemmas 28 and 29 here above. 

Inserting (48) in (64) yields 

PnA{Y^) = (^j gnv{Y^)p{'^i.nln = v\ Ui,„ > na)d^ 
(l + Op^A (max {en (log nf , (exp -ncf^ ^ ^ 

for some 5 <\. 

The conditional density of Ui,,i/n given (Ui^„ > no) is given in (45) which holds uniformly in v on (a, a + c). 



Summing up we have proved 

Pua{Y^) = 

/ pa+c 

[nm"^ {a) I gnviYi )(^cxp^nm~^{a){v~a)^dv 
(l + Op„A (max (e„ (log nf , (exp -ncf^ ^ ^ 

as n — 7> oo for any positive ^ < 1. 

In order to get the approximation of pnA by the density gnA it is enough to observe that 

i-a+c 

nmT^ (a) / 9nv{Yi) (exp ~nni~^ (a) (v — a)) dv 

J a 

= I + Op ^ (exp — nc) 

as n — >■ cxD which completes the proof of (48). The proof of (49) follows from (48) and Lemma 6. 
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